Vaccination rates


Konstantin Burkin

21 December 2021


Introduction


This work was created and submitted as a final course project for MSU course "Data analysis with Python" in autumn semester 2021. The project was written in Google Colab, using Python version 3.7.13. This work is available in my Github repository, where it is possible to download Colab Notebook and see the code.

The goal of this project is to find an interesting dataframe, to statistically describe it, remove or fill in the missing values, visualise the patterns in data and make a prediction on the basis of the obtained data. In this case, the dataframe describing COVID-19 World Vaccination Progress was chosen. After cleaning the data and visualising the rates of vaccinations in different countries and continents, the date was predicted when the whole population of the country will be vaccinated.

The result of this project is a dataframe containing id the names of the countries and the predicted date when the population is fully vaccinated.

Brief outline

  • Setting up notebook environment: data import, data subsetting import of libraries
  • Exploration of dataframes: description of data types, description of column names, calculation of NAs.
  • Filling missing values
  • Visualisation of absolute vaccination rates across continents and fully vaccinated ratio for each country.
  • Extrapolation and prediction of the date of fully vaccinated population.


Import of data and Python libraries


The following data analysis includes several Python libraries for data analysis and ploting:
  • Numpy
  • Pandas
  • Plotly
  • Sklearn
  • Datetime
  • Google colab
The dataframe was downloaded and read from the Github repository as a csv file. The dataframe is available on Github or Kaggle. The original dataframe contains 104 214 observations of 16 parameters. For this project only 8 parameters were subsetted and used.

The 5 random rows of the dataframe

Country Date Vaccinations Vaccinated Fully_Vaccinated Vaccinations_Ratio Vaccinated_Ratio Fully_Vaccinated_Ratio
Canada 2021-07-22 47056217.0 26783674.0 20268756.0 123.61 70.36 53.24
Italy 2021-05-31 35434142.0 23815455.0 12280305.0 58.70 39.45 20.34
Palestine 2022-02-15 NaN NaN NaN NaN NaN NaN
Bangladesh 2021-12-29 NaN NaN NaN NaN NaN NaN
Montenegro 2022-03-28 668025.0 289643.0 281511.0 106.36 46.12 44.82


Exploratory Data Analysis


Statistical data description

The dataframe describes vaccination rates in 235 countries across the world. The vaccinations began in the end of 2020 and vaccination progrm still exists at spring of 2022. There are several parameters that describe vaccination rates. The names of the columns of the data frame that descibe each type of parameter are listed below with the description. It is important to underline that the number of vaccinations could be larger than the population since many vaccines require two shots. Moreover, many people traveled aboad to recieve a better or additional vaccine.

Table of variables

Variable Description Datatype NAs
1 Country The country for which the vaccination rate is provided object 0
2 Date Date for the data entry object 0
3 Vaccinations The absolute number of immunizations in the country float64 50241
4 Vaccinated Total number of people vaccinated. A person, depending on the immunization scheme, will receive one or more (typically 2) vaccines; at a certain moment, the number of vaccination might be larger than the number of people float64 52683
5 Fully_Vaccinated The number of people that received the entire set of immunization according to the immunization scheme (typically 2) float64 55236
6 Vaccinations_Ratio The ratio between vaccination number and total population up to the date in the country float64 50241
7 Vaccinated_Ratio The ratio between population immunized and total population up to the date in the country float64 52683
8 Fully_Vaccinated_Ratio The ratio between population fully immunized and total population up to the date in the country float64 55236

Statistical description of numeric data

Vaccinations Vaccinated Fully_Vaccinated Vaccinations_Ratio Vaccinated_Ratio Fully_Vaccinated_Ratio Fully_Vaccinated_Ratio
mean 2.127387e+08 1.038874e+08 8.574138e+07 86.25 42.69 37.19 37.19
min 0.000000e+00 0.000000e+00 1.000000e+00 0.00 0.00 0.00 0.00
max 1.177483e+10 5.176686e+09 4.713085e+09 355.75 124.88 122.94 122.94
25% 8.281678e+05 4.884155e+05 3.877810e+05 17.83 12.25 7.49 7.49
50% 6.360699e+06 3.770917e+06 2.952149e+06 75.80 45.47 35.78 35.78
75% 3.963268e+07 2.219722e+07 1.822346e+07 142.02 69.76 63.88 63.88
Three columns that describe ratio of vaccinated people (Vaccinations_Ratio, Vaccinated_Ratio, and Fully_Vaccinated_Ratio) contain values of more than 100%, since some vaccinations require two immunization shots. However, the fact that in some countries there could be more fully vaccinated people that the amount of population (Fully_Vaccinated_Ratio > 100 %) seems erroneous.

Filling missing values

It is unlikely that vaccination rates can drasticaly change between two points of data entry. Therefore, it seems logical to fill missing values with linear interpolation. To be certain that filling of missing values was correct and linear approximation was pertinent, it is possible to see the visualisation of the plots in the next section.

Data visualisation

The first graph presents absolute vaccinations rates across six continents. Filling missing values did not result in any kind of unusual values or outliers. Therefore, it is possible to build predictions for the end of vaccinations.

Before going to the next step it is interesting to notice that Europe, Asia, and North America were the first countries to develop vaccines against COVID-19 and deploy a full-scale vaccination program. Other continents, like Africa, South America, and Ausrtalia lagged behind in vaccination rates, since they did not have neither developed pharm industry and nor enough resources, and they had to wait for vaccine shots supplies from developed countries.

These patterns can be seen in the graphs. Europe, Asia, and North America have higher numbers of vaccination rates since the end of 2020.
The ratio of fully vaccinated people is ploted below for each country. It is possible to choose the country of interest using the dropdown menu.

Again, it is evident that despite the fact that vaccinations began in the end of 2020 many countries started vaccinations programs much later. At that time only developed asian, european and north american counries could financially and logisticaly afford to begin vaccinations. Less developed countries started to receive vaccines later in the beginning of 2021.


Prediction


  • Prediction of the end of vaccination of population of any continent
  • Prediction created using linear approximation for the recent data obtained


Results


In this project...
  • Exploratcs.
  • Correlation ames.
  • Correlaly.
  • 6 ctions.




Main page